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ABSTRACT 

This paper proposes a novel Gaussian process approach to 
fault removal in time-series data. Fault removal does not delete 
the faulty signal data but, instead, massages the fault from 
the data. We assume that only one fault occurs at any one 
time and model the signal by two separate non-parametric 
Gaussian process models for both the physical phenomenon 
and the fault. In order to facilitate fault removal we introduce 
the Markov Region Link kernel for handling non-stationary 
Gaussian processes. This kernel is piece-wise stationary but 
guarantees that functions generated by it and their derivatives 
(when required) are everywhere continuous. We apply this 
kernel to the removal of drift and bias errors in faulty sensor 
data and also to the recovery of EOG artifact corrupted EEG 
signals. 


I. Introduction 

Gaussian processes (GPs) are experiencing a resurgence 
of interest. Current applications are in diverse fields such 
as geophysics, medical imaging, multi-sensor fusion a and 
sensor placement HI. A GP is often thought of as a “Gaussian 
over functions” j?!. It can be used to construct a distribution 
over functions via a prior on the function’s values. The prior is 
specified through a positive-definite kernel, which determines 
the covariance between two outputs as a function of their 
corresponding inputs. A GP is fully described by its mean and 
covariance functions. Suppose we have a set of training data 
D = {{xi,yi),..., (xn, t/n)} drawn from a noisy process: 

2/i = f(xi) + ei (1) 

where f(xi) is the real process and is zero-mean Gaussian 
with variance Eor convenience both inputs and outputs 
are aggregated into X = {xi,..., Xn} and Y = {yi ,..., y^} 
respectively. The GP estimates the value of the function / 
at sample locations X* = {x^i^. ., x^m}- The basic GP 
regression equations are given in 12l: 

/. =K{X,,X)[K{X,X)+all]-^Y , ( 2 ) 

Cov(/*) - K{X,,X)[K{X,X) + all]-'-K{X,,Xf 

( 3 ) 


which can be tuned to suit the application. The kernel function 
is chosen according to known properties of the underlying 
processes, such smoothness and stationarity. There are many 
existing kernels to choose from (see, for example, Q). We 
will develop a new kernel for our fault recovery algorithm. 
The prior mean is traditionally set to zero and we follow this 
convention. However, the results in this paper can be readily 
generalised to non-zero prior means. The term captures 
the noise in Eqn[^ 

The GP parameters 6 (which includes a and hyperparame¬ 
ters associated with the covariance function) can be inferred 
from the data through Bayes’ rule 


p{e\Y,x) 


p{Y\x,e) 
P{Y I X) 


p(e). 


( 4 ) 


The parameters are usually given a vague prior distribution 

p{e). 

The paper is organised as follows. Section [n| demonstrates 
how fault recovery can be achieved using non-stationary 
Gaussian processes. Section III reviews current covariance 


functions for modelling non-stationary processes. Then, Sec¬ 
tion m presents the problem description as a piece-wise 
stationary problem with boundary constraints at the onset of 
faults. Section |V]presents the MRL kernel for functions which 
are continuous boundaries and this is extended to cases where 
function derivatives are continuous at region boundaries in 
Section |VT] In Sections IVIII and IVIIII we demonsttate the 
efficacy of our approach on simulated data from a faulty 
sensor target estimation problem as well as a dataset involving 
EOG artifact corrupted EEG signals. Finally, we conclude in 
Section |I3 


II. Gaussian Process Faulty Recovery 

We assume that a faulty signal is the linear composition of 
the real physical phenomenon of interest and a deterministic 
offset induced by a fault either in the sensor measurement or in 
the physical phenomenon itself. Again, the signal is assumed 
to be drawn from a noisy process: 


where /* is the marginal posterior mean at X^ and Cov(/») 
is the corresponding covariance. The prior covariance at X* 
is K{X^,X^) where the covariance matrix K{X^, X^) has 
elements Kij = K{xi,Xj;dk)- The function K is called the 
kernel function. These functions come with parameters, 9k, 


Vi = f{xi) + e{xi) Y a ( 5 ) 

where f{xi) is the real process, e{xi) is the fault process and 
Ei is zero-mean Gaussian with variance cr^. 

Again, both inputs and outputs are aggregated into X = 
{xi,..., Xn} and r = {yi,..., y„} respectively. The GP 






regression equations, and are modified so that both 
the real and fault processes canbe inferred separately; 

A =Kf{X„X)[Ks{X,X) + all]-^Y , 

Cov(/.) =Kf{X,,X,)-Kf{X,,X)x 

[K,iX,X)+all]-^Kf{X,,Xf , 

e. =K4X„X)[Ks{X,X) + ahr'^Y , 

Cov(/.) =K,{X,,X,)-K,{X,,X)x 

[K,{X,X)+all]-^K,{X„Xf 

where Ks{X,X) = Kf{X,X) + Ke{X,X). /* is the 
marginal posterior mean at X* and Cov(/*) is the correspond¬ 
ing covariance of the real process, e* and Cov(e*) are the 
marginal posterior mean and corresponding covariance of the 
fault process, respectively. The prior covariances at X* are 
Kf (X*, X*) and (X*, X*) for the real process and the fault 
process, respectively, where, again, each covariance matrix 
is generated by a kernel function. We may assign different 
kernels to the real and fault processes. 

We shall consider two kinds of faults in this paper, drift and 
bias (see Figure [^. Although, our approach can be extended 
to arbitrary fault types. Both faults are temporary, so they 
have a start time and an end time. The fault process is 
assumed to be zero outside of this time interval. Drift faults 
are gradual. Starting from zero error they grow over time and 
then either shrink back to zero gradually or snap back to zero 
instantaneously. Bias faults are severe and immediate. They 
induce a significant error in the signal at onset which persists 
until the fault subsides at which point the signal snaps back 
onto the real process. The drift fault is continuous at the start 
whereas the bias fault is discontinuous. 




(a) Bias (b) Drift 

Fig. 1. Typical real process and faulty observations for both bias and drift 
faults. 


In order to model the drift fault we develop a kernel 
which guarantees that the fault is continuous at onset. We 
call this kernel the continuous, conditionally independent (or 
CCI) kernel. The CCI kernel has applications outside of 
fault recovery. This non-stationary kernel splices two or more 
locally stationary kernels together whilst preserving function 
continuity throughout. Consequently, we present this kernel 
thoroughly for the first time. 

III. Non-Stationary Kernels 

Many applications use stationary covariance functions for 
which the kernel is a function of the distance between the 


input points. Stationary covariance functions are appealing 
due to their intuitive interpretation and their relative ease of 
construction. Unfortunately, stationary GP functions are not 
applicable in applications where there are input-dependent 
variations in the model hyperparameters (e.g. length-scale, 
amplitude) and kernel families. Consequently, non-stationary 
GP functions have been proposed, such as the neural network 
kernel m and the Gibbs kernel la. 

Methods for deriving non-stationary kernels from station¬ 
ary kernels have also been proposed. Perhaps the earliest 
approach was to assume a stationary random field within a 
moving window IJl. This approaches works well when the 
non-stationarity is smooth and gradual. It fails when sharp 
changes in the kernel structure occur. An alternative solution 
is to introduce an arbitrary non-linear mapping (or warping) 
u{x) of the input x and then apply a stationary covariance 
function in the u-space M- Unfortunately, this approach does 
not handle sharp changes between different locally applied 
kernels very well fl. The mixture of GPs IfTOll approach uses 
the EM algorithm to simultaneously assign GP mixtures to 
locations and optimise their hyperparameters. Although the 
mixture approach can use arbitrary local GP kernels, it does 
not guarantee function continuity over GP kernel transitions. 
Paciorek ||6| proposes a non-stationary GP kernel which guar¬ 
antees continuity over region boundaries. Unfortunately, this 
approach requires that the local, stationary kernels belong to 
the same family. 


A. Example: The Gibbs Non-Stationary Kernel 
Gibbs m, Q derived the covariance function; 


K{x,x') = F[ 


2ld{x)ld{x') 


1/2 


exp 


-E 


{Xd - x'df 
^ ^li.x) + ll{x') 


( 6 ) 


where each length-scale, Ifx), is an arbitrary positive function 
of X and D is the dimensionality of x. If the length-scale varies 
rapidly then the covariance drops off quite sharply due to the 
pre-factor in Eqn As a consequence the inferred function 
estimate can be quite uncertain at length-scale boundaries. 
This is demonstrated in Eigure |^a) for which the length-scale 
changes from l{x) = 35 for x < 130 to l{x) = 15 for x > 130. 
Eurther, the Gibbs kernel does not guarantee that functions 




(a) (b) 

Fig. 2. Part (a) shows a function and its estimate obtained using the Gibbs 
Non-stationary Gaussian process kernel. Also, (b) a random sample drawn 
from the Gibbs non-stationary kernel typically showing a discontinuity where 
the length-scale changes. 


generated by the kernel are continuous. Figure |^b) shows a 











typical sample drawn from the posterior Gaussian distribution 
represented in Figure |^a). 


B. Example: Warping of the Input Space 

This example demonstrates the limitation of modelling 
piece-wise stationary functions by warping the input space as 
proposed by M- Figures and show a continuous wedge 
function with low and high signal-to-noise (SNR) respectively. 
The figures also show the mean and first standard deviation 
of two models: 

• a warped squared exponential 

• two functions, each generated from a squared exponential 
kernel each side of a boundary at a: = 100, and continu¬ 
ous there. 



Fig. 3. Low SNR: The left panel shows the ground truth (red line), 
observations (red dots) and GP estimate with mean (black line) and first 
standard deviation (grey region). The other panels show the distributions 
over length scales, Li and L 2 , infeiTed for the piece-wise plots each side 
of X = 100. 


For low SNR the warping approach can smooth over fea¬ 
tures of high curvature such as the wedge apex. For high SNR 
the warping kernel produces a “bell” estimate as it is forced 
to fit a smooth kernel at the apex. 

In many applications a completely different GP function 
may be required to model different regions within the space of 
interest and the family of non-stationary covariance functions 
currently available in the literature may be too restrictive 
to model these problems especially when there are function 
continuity conditions at region boundaries. We will show how 
arbitrary stationary GP kernels can be combined to form 
non-stationary GP covariance priors which preserve function 
continuity. We shall call the new kernel the Markov Region 
Link (MRL). 



Fig. 4. High SNR: The left panel shows the ground truth (red line), 
observations (red dots) and GP estimate with mean (black line) and first 
standard deviation (grey region). The other panels show the distributions 
over length scales, Li and L 2 , inferred for the piece-wise plots each side 
of X = 100. 


IV. Problem Description 

We will assume that a domain can be partitioned such that 
within regions (tiles) the process is stationary 0. Furthermore, 
each region may be modelled by kernels from different fam¬ 
ilies. For example, one region may be modelled by a Matern 
kernel whereas a neighbouring region may be modelled by a 
mixture of squared exponential and periodic kernels. We do 
not assume that the functions generated by these kernels are 
independent between regions and, although we desire sharply 
changing GP kernels or hyperparameters at region boundaries, 
we would also like to preserve function continuity at the 
boundaries. 

Two regions are labelled i?i and R 2 and collectively they 
form the global region R. A function over R is inferred at 
sample locations X* = ..., a;*^} given training data 

at locations X = {xi,..., a:„}. However, the training data 
locations are partitioned between the regions and the region 
boundary. [^Let Xr be the locations internal to region Rr and 
let Xb he the boundary. Then: 

X = X 1 UXBUX 2 . 

We will assume that the function can be modelled using a 
stationary GP in each region and endeavour to design a global 
GP covariance prior which preserves the individual region 
kernels. We will also endeavour to preserve function continuity 
where desired across region boundaries including, for example, 
function continuity or continuity of function derivatives. Thus, 
the necessary conditions are: 

*In ID problems the region boundary is often referred to as the change- 
point. 


















1) The global kernel K preserves the individual region 
kernels, Kr. That is, K{X,X) = Kr{X,X) for all 
X C Xj. U Xb and all regions r. 

2) The global kernel preserves function continuity, or 
derivative continuity, at the boundary. 

Proposition 1: If two regions, labelled 1 and 2, are joined 
at the boundary Xb and a function defined over the regions is 
modelled by Ki in region Ri and K 2 in i ?2 and the function 
is continuous at the boundary then: 

K^{Xb,Xb) = K2{Xb,Xb) = Kb ■ 

The boundary covariance, Kb, is a hyperparameter which can 
be learned from the training data. 

V. The Markov Link Kernel 

We assume that the processes internal to each region are 
conditionally independent given the process at the boundary 
B. The corresponding graphical model is shown in Figure 
and f{Xi) and f{X 2 ) are the processes internal to the regions 
labelled 1 and 2 and /{Xb) is the process at the boundary. 
The process in region 1 and at the boundary is modelled using 



Fig. 5. Non-stationary GP prior graphical model. 

the GP kernel Ki. The rows and columns of Ki correspond 
to the stacked vector Oi — {Xi, Xb)'^' 

K,{X,,X,) K,iX,,XBV\ 

K^iX,,XB) K^iXB,XB) ) ■ 

Similarly, the process in region 2 and at the boundary is 
modelled using the GP kernel K 2 where the row and columns 
correspond to the stacked vector O 2 = {Xb,X 2 )^'. 

K2{Xb,Xb) K2iX2,XBf\ 
K2{X2 ,Xb) K2{X2,X2) ) ■ 

Of course, if the kernels both accurately model the prior 
covariance of the process at the boundary then: 

Ki{Xb,Xb) = K 2 {Xb,Xb) = Kb . 

So we condition both Ki and K 2 on Kb to yield Kf and iTJ 
respectively: 

K*iXi,X2)=KiiXi,Xi)+Gi[KB-Ki{XB,XB)]G^ , 

K*{X2,X2)=K2iX2,X2)+ G 2 [Kb - K2{Xb,Xb)]G^ 


where Gi = KiiXi, Xb)Ki{Xb, Xb)-^ and G 2 = 
K 2 {X 2 ,Xb)K 2 {Xb,Xb)~^- The global prior covariance is 
then: 

/Kt{Xi,Xi) KI{X^,XbV d \ 
K=\kI{Xi,Xb) Kb KI{X 2 ,XbY\ . 

\ K^{X 2 ,Xb) K^{X 2 ,X 2 ) j 

where the rows and columns correspond to the stacked vector 
O = {Xi,XB,X 2 y■ The cross-terms, D, are: 

D^Coy{f*{X,)J*{X 2 )) 

where /* and /| are the region function values conditioned 
on the function at the boundary: 

f*(X,) = K,{X,,XB)K-^fiXB) , (7) 

f*{X 2 ) = K 2 {X 2 ,XB)Ks^f{XB) . (8) 

Since Cov(/(2fB),/(X b)) = Kb then: 

D = GiKbGI . 

As a corollary of this approach we can derive a Gaussian 
process kernel for ID signals with a change point at xb'- 
Corollary 1: If Ki and K 2 are two stationary GP kernels 
(not necessarily from the same family) which model region 1 
and region 2 respectively, and 9i and 62 are their hyperparam¬ 
eters, then: 

K{xx,X2\01,62) 

' Ki{xi,X2-,ei) + gi{xi)\KB - Ki{XB,XB-,ei)]gi{x2)'^ 
if xi,a ;2 < Xb 

K2{xi,X2-,e2) + g2{xi)[KB - K2{XB,XB\e2)\g2{X2)'^ 
if Xi,X2 > Xb 
gi{xi)KBg2{X2)'^ 
otherwise . 

where: 

gi{xi) = Ki{xi,XB-,ei)K 2 {XB,XB-,ei)-^ , 

' 72 ( 2 : 2 ) = K 2 {X 2 ,XB',d 2 )K 2 {XB,XB',d 2 )~^ ■ 

To demonstrate the new kernel we return to the problem in 
Figure]^ Using identical hyperparameters and observations as 
in Figure the function estimate obtained using the Markov 
Region Link approach is shown in Figure 

VI. Derivative Boundary Conditions 

So far, we have developed covariance functions which 
preserve function continuity at the boundary. The approach 
can be extended to assert function derivative continuity at the 
boundary. The covariance between a function and any of its 
derivatives can be determined from the GP kernel Q. For 
example, the prior covariance between the function and its 
first derivative is: 

[dK{X,Y)],, 4 Cov 

_ dK{xi,Xj) 
dxi 






Fig. 6. Function and estimate obtained using conditionally independent 
function segments described by stationary Gibbs Gaussian process kernels 
joined at the boundary. 

where Xi G X and Xj G Y. The covariance between the 
derivatives is: 

^ d^K{x,,Xj) 
dxidxj 


to construct a global prior for processes which are condition¬ 
ally independent in each region. This is done by defining Kb 
as follows: 

_fK,{XB,XB) [dK,iXB,XB)f\ 

{dKi{XB,XB) ddKiiXB,XB) j 

and using K[ and K 2 defined above in place of Ki and K 2 
in Section lY] 



Fig. 7. Function estimated using two stationary Gibbs kernels joined at 
X = 130 with the constraint that the function first derivative is continuous at 
X = 130. 


The derivative variance at Xi can be obtained by setting Xj 
to Xi in Eqn In our notation dK{X,Y) denotes partial 
derivative with respect to the first parameter, in this case X 
and ddK{X, Y) denotes double differentiation with respect to 
both X and then Y. 

These relationships can be used to dehne non-stationary GP 
covariance priors which impose continuous function deriva¬ 
tives at region boundaries. The prior mean and covariance for 
both the regional and global priors are augmented to include 
the function derivative. For example, if the hrst derivative is 
added to the prior then the prior covariances for regions Ri 
and i ?2 become: 


( Ki{Xi,Xi) A'i(Xi,Xs) 
ifi(Xs,Xi) Ki{XB,XB) 

\dKi{XB,Xi) dKi{XB,XB) 


[dKi{XB,XB)f 
ddKi{XB,XB)) 


and: 


( K2{Xb,Xb) 
K^=\ dK2iXB,XB) 
\[dK2iXB,X2)f 


[dK2{XB,XB)r K2 {Xb,X2)\ 
ddK2 {Xb,Xb) dK2{XB,X2)\ 
[dK[XB,X2)Y' K2{X2,X2) j 


The rows and columns in K[ correspond to the stacked 
vector Oi = (Vi, where d{XB) denotes the 

function derivative at X^. Similarly, the rows and columns in 
K 2 correspond to the stacked vector O 2 = (X^, (^(Xb), X2). 
Consequently, we can use the approach outlined in Section [V] 

^We shall use prime to denote the augmented covariances. 


Figure shows the effect that the derivative constraint can 
have on the function estimate. Two stationary Gibbs kernels 
are used with l{x) — 35 for x < 130 and l{x) = 15 for 
X > 130 as in Figure]^ Clearly, the approach which imposes a 
continuous first derivative on the GP model produces a tighter 
function estimate at a; = 130. 

VII. Application 1: Target Estimation with Faulty 
Sensors 

We shall use a GP to estimate a target’s position over time 
t as it is tracked by a simple sensor. However, the sensor is 
faulty and outputs readings which have either drifted gradually 
from the truth or undergone a sudden jolt away from the truth 
resulting in a fixed bias in the reading (see Figure [^. 

The proposed filter algorithm operates on-line and infers 
a posterior distribution for the current target’s position using 
observations of its previous locations. Smoothing from future 
observations is not considered. The target’s trajectory is de¬ 
scribed by the process / and the sensor is subject to occasional 
faults. The sensor’s fault process, e, can be either a short term 
fixed bias or it can drift over a period of time. The, possibly 
faulty, observation at time ti is: 

Vi = f{ti) + e{ti) -I- e* 

where is zero mean, Gaussian with variance = 0.001. 
We wish to estimate target location f{t) over time. 

The processes / and e are modelled by GP kernels Kf and 
Ke respectively. We will assume that Kf is stationary and 
we will use a simple squared exponential kernel to model the 









target dynamics. However, the fault is intermittent and it starts 
at time t = Tq and ends at t = Ti. We model the fault process 
using a non-stationary kernel. Firstly, e{t) is zero over times, 
t < Tq and t > Ti, for which there is no fault. For a bias 
fault, we assume that the bias is a hxed offset and thus assert: 

Kuas{ti,tj) = fi for all Tq < < Ti . 

where /i is a scale parameter representing the magnitude of 
the bias. We assume that the drift is gradual and build the drift 
model using a squared exponential kernel: 

Kdrtft{U,tj) = A* exp 

that will be modihed shortly to accommodate drift error 
starting from zero. The parameters /i and L are the output 
and input scales, respectively, and again, Tq <k,t,<Ti. 

The bias fault causes a jump in the observation sequence 
when the fault sets in at t = Tq. However, the drift fault 
is gradual and e(T) is zero at t = Tq and discontinuous 
at Ti when the fault disappears (see Figure [^l. We use the 
Markov Region Link kernel to construct the drift fault process 
prior covariance Kqqq from iTdrift and impose the continuity 
boundary condition at Tq. Using the approach set out in 
Section |V] the prior covariance for the drift fault becomes the 

0 0 0 \ 

0 K*Q,,{TQ,Xfr 0 

iT*.f.(Xy,To) K*„QiXf,Xf) 0 ■ 

0 0 0 / 

The hrst row and column are zero matrices for times less 
than Tq, corresponding to the period before the fault starts. 
The last row and column are zero matrices for times greater 
than Ti, corresponding to times after the fault has stopped. 
The central row and column blocks are prior covariances over 
time samples Xf during which the sensor is faulty: 

Xf = {t\TQ<t<Ti} . 

Continuity of the fault process at Tq imposes Tq) = 

0. Values for obtained using Corollary [T]in Section [V| 

with Xb = Tq, Ki =0, Kb = K*q,^{Tq,Tq) = 0 and K2 = 
Kqqq. 

The bias kernel is more straight forward: 

/O 0 0 0\ 

_ 0 fi 0 

0 n 0 
\0 0 0 0 / 

with the rows and columns interpreted as for iTdrift- 

Figure shows typical tracks, observations and GP track 
estimates. The target trajectory and observation sequence are 
randomly generated, / ~ N(0, Kf) and y ^ N(0, K^, + cr^J). 
Notice that the algorithm has successfully corrected for the 
faulty observations. 

Parallel faults may also be modelled using our approach. A 
parallel fault comprises more than one basic fault occurring 
at the same time. In the following example the sensor mea¬ 
surement drifts to a new hxed bias offset. The fault can be 


block matrix: 


K* — 

^ drift — 


/o 

0 

0 

Vo 





(a) Bias 


(b) Drift 


Fig. 8. Typical target trajectory and faulty observations for both bias and 
drift faults. 


modelled using a drift fault, without fault correction, followed 
immediately by a biased fault. 

A SICK laser range sensor is used to track a person. The 
sensor is subject to a knock as it tracks the person. This knock 
results in the sensor drifting for a period before coming to rest. 
The sensor data thus exhibits a drift followed by a constant 
bias. Figure shows a typical 180 bearing range scan at a 
single time instance during the tracking procedure. Figure [T0| 
shows the data obtained from our sensor (transformed to the 
laboratory centred Cartesian coordinates) and also the target’s 
true trajectory obtained by using objects in the environment 
with known location. 



Fig. 9. Typical SICK sensor range/bearing scan of the laboratory. 


We use the MRL kernel to splice together both the drift 
kernel and the bias kernel and assert that the target observa¬ 
tions are continuous at the transition from drift to bias fault. 
Both the transition time and the kernel induced variance, Kb, 
at that time are parameters of our model. 

Figure 11 shows the ground truth and the one standard 
deviation estimate if no fault is assumed as the person moves 
from right to left. The magnitude of the error is under¬ 
estimated. 

Figure 12 shows the GP on-line estimated trajectory as 
well as an estimate of the fault. Both estimates are correctly 
determined in this case. 





















Fig. 10. True and observed target trajectoi^ obtained using SICK range 
sensor. 



Fig. 11. GP target trajectory estimate obtained by ignoring fault. 


VIII. Application 2: EOG Artifact Removal From 
EEG Signals 

In this example we detect and remove EOG artifacts from 
EEG signals by modelling the artifacts as drift type faults. The 
recovery of the EEG signal is often treated as a blind source 
separation problem 11 where ICA identihes the separate 
artifact free EEG signal (which we refer to as EEG*) and 
the EOG signal. In our approach we use explicit models for 
the EEG* and EOG signal and stipulate that these signals are 
independent. 

The “observed” EEG signal, y, is a mixture of EEG* and 
EOG artifact signals. The EEG* signal, Seeg*, is modelled as 
the combination of a smooth function vrieeg* (generated by a 
GP with prior covariance Keeg*) and i.i.d distributed residuals 
reeg*- The EOG artifact, Seog, is modelled as a piece-wise 
smooth function, rrieog (generated from a GP “fault” model 
with prior covariance Keog) and, similarly, for the residuals 

feog- 




Fig. 12. GP target trajectory estimate obtained after recovering from faults. 


where nieeg* ~ N{0,Keeg*), T eeg^ Reeg^I'j, TTl^og ^ 

N(0, iTeog) and Veog ~ N(0,i?eogr) and I is the identity 
matrix. The random vectors nieeg*, nieog, Veeg* and Veog are 
assumed to be mutually independent. The residuals, reeg* and 
reog, are considered to be part of the signals and are therefore 
not treated as noise and are not hltered out. 

We use a simple squared exponential to model the EEG* 
signal. Typically the EOG signal is zero everywhere except 
within a small time window. Within this window the EOG arti¬ 
fact can be modelled as two smooth functions (not necessarily 
monotonic) which join at a spike near the centre of the window 
(so that the EOG signal is continuous but not differentiable at 
the join). Thus, the EOG’s prior covariance Keog is chosen to 
be zero everywhere except between the artifacts start and end 
times, Tg and Tg, respectively. We use the methods outlined in 
Sectionjyjto build the EOG artifact prior covariance. Between 
the start and end times the EOG artifact signal is modelled by 
two piece-wise squared exponential kernels joined mid-way 
between Tg and Te so that they are continuous at the mid¬ 
point. Furthermore, the EOG signal is zero at Tg and Te- 

The following GP equations determine the mean and co- 
variance for the hidden variable, nieeg*'- 


^T^eeg* ) 

— [-^eeg* 5 X 







[Ke.g.{X,X)^ 

Xeog 

{X,X) 

+ 

x/]- 

'"yix) 

C0Veeg*(x*) 

-—■ -^eeg* ) 







Xeeg^ (^*5 X ) X 







[i^ee5.(X,X) + 

Xeog 

{X,X) 

+ 

x/r 

-'x 


Xe.eg* (^+5 X ) 







-Seeg* — nigeg* “t“ reeg* , 
^eog — nieog “f '^eog , 
y — ^eeg* “f ^eog - 


where cr^ = Reeg* + Reog- Similar expressions can be 
obtained for nieog- 

To track the EEG signal our algorithm determines Seeg* 
sequentially over time. When is the current time and X is 















the previous times at which data points were obtained then: 

p{Si,eg*ix,),Si,og{x*) \ y(x*),meeg*{x*),meogix,)) 

OCp{y{Xt) I Seeg*iXt),SeogiXt},7heeg,iXt},7heog(Xt)) 

X p(Seeg* (:r * ) 5 Sgog (:r* ) | UT-eeg* (x* ) , (x* )) 

OC 5l/(n:.),o„g.{2;.) + Seos(x.) 

X p{Seogix,) I rheog{Xt)) 

X p{Seeg*iXt) \ meeg*{Xt)) . 


Marginalising Seog' 

) I ) 5 ) 5 ‘^eog ) ) 

OC p{y{x^) - Seeg*{x^) \ meogix*)) 
X p{Seeg*ix^) \ meeg*ix»)) . 


In general, when s is Gaussian distributed then its mean, s, is 
the solution to: 

dlogpjs I ■) ^ 
ds 

and its variance, Var(s), is given by: 


Var(s) = -E 


d'^logpjs I ■) 


interval for the artifact free EEG* signal and the EGG artifact 
obtained using our algorithm. Eigure shows the mean 
difference between the original EEG signal and the inferred 
EEG* signal, indicating the expected proportion of the original 
signal that is retained in the EEG*. 



Thus, dehning P*{x:t) = CoVeeg*(a:*) + R. 


^eeg* 


and 


p* 


"eeg* 


and: 


- COVgo^ 

p. 

(x*) = 


{Xif, ) -f Reog- 

eeg*i.X*){y{xRj - TOeog(x*)) + P*^g{xR)meeg*{x 


Fig. 13. EEG signal (crosses) and 1 standard error confidence intervals for 
the EEG* (left panel) and EOG (right panel) signals obtained using the GP 
mixture model approach. 


Var, 


eeg^ 


(x*) = 


PLg*{x*) + Peogix*) 


PL„^{X*) + Peon{x 


eeg* \ 


p* 


eeg^ ^^*)Peog 

Similar reasoning leads us to similar expressions for the EOG 
artifact signal: 


^eog 


, . Peogix*)iyix^) - meeg*{x^)) + P*egAx*)meogix*) 

(X*) = --2---" - 


Peeg*ix, 


)+PSogix, 


) 


and: 


Vareog(x*) = 


P* (r 

eeg* 


)+PLnix* 


Peeg*ix*)P:og{x*) 


These expressions for Seeg* and Seog determine the proportion 
of the EEG signal residual that is assigned to the EOG artifact 
signal and also to the artifact free EEG signal (EEG*). 

Our model requires eight hyperparameters, collectively re¬ 
ferred to as 6 : the scale heights and scale lengths for the GP 
models (we assume that both parts of the EOG model have 
the same scale heights and lengths); the artifact start and end 
times and also the residual variances Reeg* and R^og- The 
likelihood used in Equation to determine a distribution over 
the hyperparameter values is given by: 



p{y{xt) I 0 ) = N[y{x^);meeg*,e{xt) + meog,eix^), 

PLgA^*) + PLg^eM] • 

The hyperparameters are marginalised using Monte-Carlo 
sampling. 

Eigure [TI] shows a typical EEG signal which is corrupted by 
EOG artifacts. It also shows the one standard error conhdence 


Fig. 14. Original EEG signal (dots) and difference (line) between original 
signal and the mean EEG* obtained using the GP mixture model approach. 


IX. Conclusions 

This paper has presented a novel approach for non¬ 
stationary Gaussian processes across boundaries (i.e. change- 
points for ID signals). It builds piece-wise stationary prior 
covariance matrices from stationary Gaussian process kernels 
and, where appropriate, asserts function continuity or conti¬ 
nuity of any function derivative at the region boundaries. The 
approach has been successfully demonstrated on sensor fault 
detection and recovery and also on EEG signal tracking. 
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